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Abstract 

In this paper we have investigated the performance of PSO Particle Swarm Optimization based 
clustering on few real world data sets and one artificial data set. The performances are measured 
by two metric namely quantization error and inter-cluster distance. The K means clustering 
algorithm is first implemented for all data sets, the results of which form the basis of comparison 
of PSO based approaches. We have explored different variants of PSO such as gbest, Ibest ring, 
Ibest vonneumann and Hybrid PSO for comparison purposes. The results reveal that PSO based 
clustering algorithms perform better compared to K means in all data sets. 
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1. INTRODUCTION 

Data clustering is the process of grouping together 
similar multi-dimensional data vectors into a number of 
clusters or bins. Clustering algorithms have been 
applied to a wide range of problems, including 
exploratory data analysis, data mining [1], image 
segmentation [2] and mathematical programming [3,4] 
Clustering techniques have been used successfully to 
address the scalability problem of machine learning and 
data mining algorithms. Clustering algorithms can be 
grouped into two main classes of algorithms, namely 
supervised and unsupervised. With supervised 
clustering, the learning algorithm has an external 
teacher that indicates the target class to which a data 
vector should belong. For unsupervised clustering, a 
teacher does not exist, and data vectors are grouped 
based on distance from one another. This paper focuses 
on unsupervised clustering. 

Many unsupervised clustering algorithms have 
been developed one such algorithm is K-means which 
is simple, straightforward and is based on the firm 
foimdation of analysis of variances. The main drawback 
of the K-means algorithm is that the result is sensitive 
to the selection of the initial cluster centroids and may 
converge to the local optima. This is solved by PSO as 
it performs globalized search for solutions. 

So this paper explores the applicability of PSO 
and its variants to cluster data vectors. In the process of 
doing so, the objective of the paper is: 

• to show that the standard PSO algorithm can be 
used to cluster arbitrary data, and 

• to compare the performance of PSO and its variants 
with standard K-means algorithm. 



The rest of the paper is organized as follows: 
Section 2 presents an overview of K-means algorithm. 
The basic PSO and its variants are discussed in section 
3. Fimction optimization using PSO models are given in 
section 4. How Clustering is done with PSO is 
discussed in section 5. Experimental results are 
summarized in section 6 and Conclusion and further 
work is emphasized in section 7. 

2. K-Means Clustering 

One of the most important components of a 
clustering algorithm is the measure of similarity used to 
determine how close two patterns are to one another. K- 
means clustering group data vectors into a predefined 
number of clusters, based on Euclidean distance as 
similarity measure. Data vectors within a cluster have 
small Euclidean distances from one another, and are 
associated with one centroid vector, which represents 
the "midpoint" of that cluster. The centroid vector is the 
mean of the data vectors that belong to the 
corresponding cluster. 

For the purpose of this paper, following symbols are 
defmed: 

• N denotes the input dimension, i.e. the number 

of parameters of each data vector 

• Ng denotes the number of data vectors to be 
clustered 

• 7V(, denotes the number of cluster centroids (as 

provided by the user), i.e. the number of clusters to 
be formed 
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• z p denotes the p' data vector 

• rUj denotes the centroid vector of cluster j 

• is the number of data vectors in cluster j 

• C j , is the subset of data vectors that form cluster 
j- 

Using the above notation, the standard K-means 
algorithm is summarized as 

1. Randomly initialize the N cluster centroid 

vectors 

2. Repeat 

(a) For each data vector, assign the vector to the class 
with the closest centroid vector, where the distance 
to the centroid is determined using 




Where k subscripts the dimension 

(b) Recalculate the cluster centroid vectors, using 

nij = 2^z — (2) 

until a stopping criterion is satisfied. 

The K-means clustering process can be stopped 
when any one of the following criteria are satisfied: 
when the maximum number of iterations has been 
exceeded, when there is little change in the centroid 
vectors over a number of iterations or when there are no 
cluster membership changes. For the purposes of this 
study, the algorithm is stopped when a user-specified 
number of iterations have been exceeded. 

3. Particle Swarm Optimization and its variants 

Particle swarm optimization (PSO) is a 
population-based stochastic search process, modeled 
after the social behavior of a bird flock [5,6]. The 
algorithm maintains a population of particles, where 
each particle represents a potential solution to an 
optimization problem. In the context of PSO, a swarm 
refers to a number of potential solutions to the 
optimization problem, where each potential solution is 
referred to as a particle. The aim of the PSO is to find 



the particle position that results in the best evaluation of 
a given fitness (objective) function. 

Each particle represents a position in 
dimensional space, and is: "flown" through this multi- 
dimensional search space, adjusting its position toward 
both 

• the particle's best position found thus far. and 

• the best position in the neighborhood of that 
particle. 

Each particle / maintains the following 
information: 

• X; : The current position of the particle; 

• V; : The current velocity, of the particle; 

• yi : The personal best position of the panicle. 

Using the above notation, a particle's position is 
adjusted according to 

%('+l)=^a-W+Ciai-('kW-%W)+C2'2i(')[.Va(')- •%(')) 
(3) 

^i + l) = ^i (0 + ^i + l) ----- (4) 

^2,7 W ~ t^fel) ''"d k=l, , 

Where W is the inertia weight, C\ , are the 
acceleration constants and r is the random number 
generated for avoiding and biasing effect to social and 
cognitive components. 

The velocity is thus calculated based on three 
contributions: (1) a fraction of the previous velocity, 
(2) the cognitive component which is a function of the 
distance of the particle from its personal best position, 
and (3) the social component which is a function of the 
distance of the particle from the best particle found thus 
far (i.e. the best of the personal bests) 

The personal best position of particle i is 
calculated as 

Two basic approaches to PSO exist based on the 
interpretation of the neighborhood of particles. 
Equation (3) reflects the gbest version of PSO where, 
for each particle, the neighborhood is simply the entire 
swarm. The social component then causes particles to 
be drawn towards the best particle in the swam. In the 
Ihest PSO model, the swam is divided into overlapping 
neighborhoods, and the best particle of each 
neighborhood is determined. For the Ibest PSO model, 
the social component of equation (3) changes to. 
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/a 



(6) 



Where y 

■th 



is the best particle in the neighborhood 



of the i particle. 

The PSO is usually executed with repeated 
application of equations (3) and (4) until a specified 
number of iterations have been exceeded. Alternatively, 
the algorithm can be terminated when the velocity updates are 
close to zero over a number of iterations. Ibest-ring is one of 
the variant of PSO in which the pbest is determined with 
respect to the neighboring adjacent particles as shown in 
figure 1. 




Figure 1 - Ring architecture 



In Von-Neumann architecture the particles are 
considered to be in two dimensional matrix, pbest of 
the particle is determined with respect to four 
neighboring adjacent particles as shown in figure 2. 



4 PSO a tool for Function Optimization 

PSO can be applied to number of real world 
problems hke optimization which has been expanding 
in all directions at an astonishing rate. So the 
optimization of several complex functions is done with 
PSO. We have applied the different variations of PSO 
namely West (ring and von-Neumann architectures) 
[7,8] and gbest for optimizing some standard 
Benchmark functions given in the Table 1 [7], with 
range of search and maximum velocities in Table 11, 
and corresponding results are given in table 111. 

Table I: Benchmarks for simulations 
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Table II: Range of search and Maximum Velocity 
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Figure 2 - Von-Neumann architecture 
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Table-Ill: Results 
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From the above results it can be seen that the PSO 
is a very good candidate for solving optimization 
problems. So the data clustering problem is a sort of 
optimization problem where in the objective is to find a 
similar data objects into a specific group. In our work 
the PSO is used for investigating this objective. 

5. PSO Clustering 

In the context of clustering, a single particle represents 
the cluster centroid vectors. That is, each particle 

Xj is constructed as follows: 

Ki = (mil , ■ ■ ■ , niijf - ■ ■ , m^A/J (7) 

where ™(J refers to the j-th cluster centroid vector 
of the i-th particle in cluster Cy. Therefore, a swarm 
represents a number of candidate clusters for the current 
data vectors. The fitness of particles is easily measured 
as the quantization error. 



Where d is defmed in equation (1), and l^'jlis 
the number of data vectors belonging to cluster ^'J'i.e. 
the frequency of that cluster. 

This section presents a standard PSO for 
clustering data into a given number of clusters. 

5.1 PSO Cluster Algorithm 

Using the standard gbest PSO, data vectors can be clustered 
as follows: 

1. Initialize each particle to contain A'^j, , randomly 
selected cluster centroids. 

2. For t " 1 to tfnax do 
(a) For each particle i do 
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(b) For each data vector Zp 

i) Calculate the Euclidean distance d(zp,mi to 

all cluster centroids 

ii) Assign Zp to cluster j such that 

iii) Calculate the fitness using equation (8) 

(c) Update the global best and local best positions 

(d) Update the cluster centroids using equations 
(3) and (4) 

Where t„, is the maximum number of iterations. 

The population-based search of the PSO algorithm 
reduces the effect that initial conditions have, as 
opposed to the K-means algorithm; the search starts 
from multiple positions in parallel. Section 6 shows that 
the PSO algorithm performs better than the K-means 
algorithm in terms of quantization error. 
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6. Data Set and Experimental Results 

This section compares the results of the K-means 
and PSO algorithms on five clustering problems. The 
main purpose is to compare the quality of the respective 
clustering, where quality is measured according to the 
following two criteria: 

• the quantization error as defined in equation (8); 

• the inter-cluster distances, i.e. the distance 
between the centroids of the clusters, where the 
objective is to maximize the distance between 
clusters. 

For all the results reported, averages over 30 
simulations are given. All algorithms are run for 1000 
function evaluations, and the PSO algorithms used 10 
particles. The Hybrid PSO takes the seed from result of 
K-means clustering. This seed is considered as one 
particle in swarm of particles in PSO. For PSO, w is 
varying as per the paper [9]. The initial weight is fixed 
at 0.9 and the final weight at 0.4. The acceleration 



coefficients cl and c2 are fixed at 1.042 to ensure good 
convergence [10]. 

The clustering problems used for the purpose of 
this paper are: 

• Iris plants database: This is a well-understood 
database with 4 inputs, 3 classes and 150 data 
vectors. 

• Wine: This is a classification problem with "well 
behaved" class structures. There are 13 inputs, 3 
classes and 178 data vectors. 

• Hayes Roth which has 132 data vectors with 3 
classes and 5 inputs. 

• Diabetes data set has 768 data vectors having 2 
classes and 8 inputs. 

• Artificial: This problem follows the following 
classification rule; 

(z,>0.7)or((zj<0.3) 

classl = 1 if 

and(z2 > -0.2 -zj) 
class2 — Otherwise 

A total of 400 data vectors are randomly created 
between (-1,1). 

Table IV summarizes the results obtained from the 
five clustering algorithms for the problems cited above. 
The values reported are averages over 30 simulations, 
with standard deviations to indicate the range of values 
to which the algorithms converge. First, consider the 
fitness of solutions, i.e. the quantization error, for all 
data sets PSO based clustering is better than K-means. 
However, Ibest VonNmunarm provides better fitness 
values in terms of quantization error and inter cluster 
distance for all data sets except for Wine. For Wine 
and Hayes Roth , Hybrid PSO gives good result. The 
Ibest Vonneuumann gives worst quantization error but 
comparatively good inter cluster distance measure for 
these data sets. The standard deviations (std) found to 
be very close, thereby indicating the convergence of 
algorithms to better results. 
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Table FV: Results of clustering 



Data Sets 


Algorithm 


Quantization 

error, std 


Inter - cluster 

distance, std 


Iris 


K means 

PSO gbest 

lb est ring 

Ibest vonneumann 

Hybrid PSO 


0.733,0.21075 
0.027209, 0.017964 
0.026615, 0.014664 
0.012477,0.019458 
0.68743,0.019081 


8.3238, 1.6821 
19.41, 2.4693 
21.079,3.8171 
20.278, 0.02204 
18.598, 0.65266 


Hayes Roth 


K means 
PSO gbest 
Ibest ring 
Ibest vonneumann 
Hybrid PSO 


11.961, 1.573 
0.77086, 0.0408 
3.99,3.3429 
3.8265, 0.98856 
0.57914, 1.9488 


8.9353, 1.2419 
324.25, 5.7895 
313.1, 3.4562 
350.73,23.272 
323.51,61.738 


Wine 


K means 
PSO gbest 
Ibest ring 
Ibest vonneumann 
Hybrid PSO 


116.29, 0.83715 
10.765,3.7278 
33.622, 7.6328 
11.709, 1.6749 
3.9663,4.3043 


2019.2,0.234 
3272.8, 292.89 
2859.7, 339.91 
3450.8,222.42 
3596.7,483.11 


Diabetes 


K means 
PSOgbest 
Ibest ring 
Ibest voimeumaim 
Hybrid PSO 


78.984, 7.6654 
69.222, 2.4839 
36.98,2.397 
33.205, 2.8501 
48.545,3.097 


20.92, 3.332 
30.12,2.719 
36.108,2.475 
38.1074,2.4714 

32.958,3.471 


Artificial 


K means 

PSOgbest 

Ibestring 

Ibest voimeumaim 

Hybrid PSO 


0.64152, 0.011867 
0.54338, 0.0057227 
0.56021,0.004647 
0.5317,0.00121 
0.55086,0.00056684 


1.3772, 0.02580 
1.2678,0.72643 
1.482, 0.13267 
1.662,0.11 
0.9086, 0.16526 



7. CONCLUSION 

This paper investigates the application of the PSO 
to cluster data vectors. Five algorithms were tested, 
namely a standard K-means, gbest PSO, Ibest ring, 
Ibestvonneummann and Hybrid PSO. The PSO 
approaches are compared against K-means clustering, 
which showed that the PSO approaches have better 
convergence to lower quantization errors, and in 
general, larger inter-cluster distances. Future studies 
will involve more elaborate tests on higher dimensional 
problems-and large number of patterns. The PSO 
clustering algorithms will also be extended to 
dynamically determine the optimal number of clusters. 
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